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As a qubit is a two-level quantum system whose state space is spanned by |0), 1 1), so a qudit is a rf-level 
quantum system whose state space is spanned by |0) , • • • , \d — 1). Quantum computation has stimulated much 
recent interest in algorithms factoring unitary evolutions of an n-qubit state space into component two-particle 
unitary evolutions. In the absence of symmetry, Shende, Markov and Bullock use Sard's theorem to prove that at 
least C4" two-qubit unitary evolutions are required, while Vartiainen, Mottonen, and Salomaa (VMS) use the QR 
matrix factorization and Gray codes in an optimal order construction involving two-particle evolutions. In this 
work, we note that Sard's theorem demands Cd 2 " two-qudit unitary evolutions to construct a generic (symmetry- 
less) ra-qudit evolution. However, the VMS result applied to virtual-qubits only recovers optimal order in the case 
that d is a power of two. We further construct a QR decomposition for rf-multi-level quantum logics, proving 
a sharp asymptotic of ®(d 2 ") two-qudit gates and thus closing the complexity question for all rf-level systems 
(d finite.) Gray codes are not required, and the optimal &(d 2 ") asymptotic also applies to gate libraries where 
two-qudit interactions are restricted by a choice of certain architectures. 
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1 Introduction 

The dominant theoretical model of quantum computation is the quantum circuit Q] acting on quantum bits, or 
qubits |2|. A qubit is a two-level quantum system, whose complex Hilbert state space is spanned by kets |0) and 
|1). The labels within the ket are evocative of classical computer logic. Yet using qubits, these logical values may 
be placed in superposition, and moreover multiple qubits may be entangled. A qudit 0] is a generalization to 
quantum computing of a classical multi-level logic Q. We fix d > 2 throughout and consider the one-qudit state 
space 

#(M) = C|0)©C|l)ffi---©C|</-l). (1) 

The decomposition is taken to be Hermitian orthonormal, and the n-qudit state space then becomes 9{{n,d) = 
®"?{(l,d) = ©cic 2 ...e„C \c\C2- ■ -c„) with c\C2 ■ ■ -c n varying over all length « integers in base d. 

A quantum computation is a procedure that takes a classical input string encoded in a quantum data state, 
processes this state using operations allowed by the laws of quantum mechanics, and finally measures the state 
to produce a classical output string. The quantum processing can be realized as a unitary evolution on the state 
space. The exact universality theorem for quantum computation with qudits [6 1 states that any unitary evolution on 
many qudits can be constructed to infinite precision using a finite sequence of single qudit and two-qudit unitaries 
or gates. Any such sequence of quantum gates that transforms classical input to classical output is known as a 
quantum algorithm. Of course, not all quantum algorithms are efficient. As most functions on bit-strings require 
exponentially many AND-OR-NOT gates, so too most unitary evolutions may only be realized with exponentially 
many quantum gates. Efficient quantum algorithms are usually defined as those using a number of single and two 
qudit gates whose size (complexity) is asymptotically bounded above by a polynomial in the number of qudits. 

We say a function h(n) £ £l\f(n)) if there is a constant C so that h{n) is at least Cf(n) for n > 1, and similarly 
h(n) £ 0[f(n)} if there is a second C so that h(n) is at most Cf(n). We say that h(n) £ ©[/(h)] when both hold. 
Several choices of gate libraries are used in quantum algorithms, but most admit asymptotically equivalent gate 
counts. We concentrate on two-qudit gates Q. Thus, the complexity of a unitary evolution U is that number t for 
which we have a minimum length expression 

u = u m u hh-- u m (2) 
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with each Uj k a two-qudit (d 2 x d 2 ) operator acting exclusively on qudits j,k. An efficient computation should pro- 
duce a family {£/„}~ =1 of unitary operators whose gate counts lu„ = h(n) satisfy h(n) G 0(n p ), some p > 0. As an 
example of this formalism, put CO = e 27t! / 2 " and consider the n-qubit Fourier transform J„ = -j= Y 2 ^J CO'* |Jfc) (j \ . 

Known circuits for f n require 0{n 2 ) gates |8), so that the Fourier transform is an efficient quantum computation. 
The extension to qudits is likewise an efficient quantum computation |9|. 

It is typical to draw the factorization of U in Equation^as a quantum circuit, representing each qudit with a 
line and drawing a gate connecting qudits j, k for each U%. Physical implementation of symmetry-less evolutions 
is not practical when the number of qudits n is large, since the number of gates required scales exponentially 
in n. Yet circuits for generic unitaries are still of interest. First, they may improve subblocks of larger circuits 
through a process of peephole optimization: when many consecutive two-qudit gates act on a small collection of 
qudits, we compute the associated unitary evolution and substitute a circuit of the sort presented here in hopes 
of decreasing the total number of required operations. Second, they are useful in translating circuits from gate 
libraries that include three and multi-qudit gates to two-qudit gates when a physical system only conveniently 
allows for pairwise interactions. They may also be used to translate an arbitrary gate library into a fault-tolerant 
library of qudit gates 1101 . Finally, we note that the symmetries that allow for polynomial-size quantum circuits are 
not well-understood. Thus, producing efficient symmetry-less circuits may provide insights into general design 
principles that might also be useful in constructing or optimizing computations. 

For qubits, Shende, Markov and Bullock have shown that £1(4") two-qubit gates are required, while a recent 
Letter 1111 provided a 0(4") construction. Thus we have a sharp asymptotic for symmetry-less n-qubit unitary 
evolution: 0(4") two-qubit gates are required. The result does not readily extend to qudits, even though qudit 
systems may be employed to emulate qubit systems and conversely. The lower bound generalizes to Cd 2n gates, 
but naive emulations of the VMS circuit require asymptotically more gates than this. Indeed, the best prior 
constructive upper bound is 0(n 2 d 2n ) two-qudit gates 1121 . 

The main result of our work is a constructive proof that &(d ) two-qudit gates are required to implement an 
arbitrary n-qudit evolution without symmetry. En route, we also prove that &(d") two-qudit gates suffice for n- 
qudit state-synthesis. The algorithm that produces the quantum circuit is a variant of the QR matrix-decomposition, 
cf. I15lll6lfm . Unlike an earlier qubit construction 1111 . it does not rely on a Gray code, either in base-two or 
base d. 

The paper is organized as follows. First in we review the justification of the lower bound of Q.(d 2n ) and 
then in fJ3]we discuss the inadequacy of qubit emulation of qudits. The remainder of the manuscript describes 
an algorithm for constructing a circuit involving two-qudit gates, carefully showing that the number of gates is 
0(d 2n ). In 2] we define a controlled single qudit gate which applies a single qudit unitary conditioned on the 
state of multiple control qudits. In particular, we show that a ^-controlled one-qudit unitary may be implemented 
in 0(k) two-qudit gates, given sufficient ancilla (helper) qudits. In <21 we describe our state-synthesis algorithm 
and adapt it to a virtual Householder reflection using singly controlled one-qudit operators. In [J6] we present our 
qudit-native quantum circuit synthesis algorithm, and establish in fQthat it produces a universal circuit with at 
most 0(d 2n ) two-qudit operations. 

2 The Lower Bound 

The lower bound argument is similar to other lower bound arguments 1 1 311 141 in quantum computing using Sard's 
theorem from smooth topology. The theorem (loosely) states that almost no values of a smooth function are 
critical values. A well-known corollary (e.g. 1141 ) then demands that for a smooth map / : M — > N that carries an 
nz-dimensional manifold M into an zi-dimensional manifold N for m < n, the set image (/) must be a measure zero 
subset of N. 

We first set some notation. By default, upper case letters indicate either matrices or unitary operators. We 
use If to denote an I x i identity matrix, and A* = A T is the adjoint of A. Recall also the Lie theory notation 
U(q) = {Ue C c ' x i ; UU^ = I q }. Suppose then that we consider an expression associated to a fixed circuit topology 
for two-qudit gates. Namely, suppose we factor a U £ U (d 2n ) as in Equation^ Suppose moreover that we take I 
and the tuples (jq,k q ) for 1 < q < I to be fixed. Then by varying the Uj qkq in U(d 2 ), we obtain a map of smooth 

manifolds / : \U(d 2 )] 1 — ► U(d"). Now generically, dim^t/^)] = q 2 . Hence the smooth function implicit in the 
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circuit diagram of Equation^carries a manifold of dimension £d 4 into a manifold of dimension d 2 ". In order for 
the image to not be measure for a fixed circuit diagram, we require I > d 2n ~ A . As there are only finitely many 
circuit topologies holding fewer than d 2 "~ 4 factors per Equation ^ we generically require 
gates of the two-qudit library to realize symmetry-less unitary evolutions within U (d n ). 

A similar invocation of Sard's theorem produces a lower bound on circuit sizes for state synthesis. Here, the 
problem is to produce the most efficient possible circuit U capable of realizing generic |\|/) £ H (n,d) from a fixed 
start-state typically chosen to be |0), i.e. building a small circuit for U so that |\|/) = U |0). We claim that circuits 
for generic state-synthesis require Q.(d") gates. Indeed, U |0) is simply the first column of the matrix realization 
of U, and taking the column of a matrix is a smooth map. Hence we apply the Sard's theorem argument above 
to / : U(d 2 ) e — > 3{{d,n), whence the result. Now a subcircuit of our universal unitary qudit-evolution circuit, 
described in Equation|S] is also capable of solving the state synthesis problem in (d" — l)/(d — I) £ 0(d") two- 
qudit gates. Hence the qudit state-synthesis generically requires &(d") gates. 

Theorem: The following asymptotics hold for d multi-level quantum logic circuits. In each statement, d is fixed 
and the asymptotic is stated exclusively in terms ofn. 

1. Given a generic \\\f), constructing a quantum circuit for a unitary U such that U |0) = |\|/) requires &(d") 
two-qudit gates. 

2. Constructing a quantum circuit for a generic n-qudit unitary operator U £ U (d 2n ) consisting of two-qudit 
gates requires &(d 2n ) two-qudit gates. 

As a remark, other gate libraries that are asymptotically equivalent to two-qudit gates might be better suited to 
certain problems. In reasonable cases, there should be a fixed upper bound on the number of library-gates required 
to realize a two-qudit unitary operator. For any such library a sharp asymptotic of &(d 2n ) gates is likewise required 
for generic unitary evolution. This holds in particular for {local unitary} U {/\[ (a* © Id-i)} 1171 . 

3 Qubit Emulation is Insufficient 

Consider two emulation schemes of qudits by qubits: 

1 . One might emulate each individual qudit with as few qubits as possible, so that the local qudit structure is 
respected. 

2. One might rather pack the entire d" dimensional n-qudit state into the smallest possible qubit state space, 
ignoring the local (tensor) structure. 

We argue that the emulation circuit for Option [2 does not attain the lower bound asymptotic, while in essense 
Option|3]does not allow for circuit-level emulation at all. 

In Option [1] label P = |~log 2 d] , so that p qubits are required to emulate a qudit. Now for the qubit circuit 
diagram, some multi-qubit gates will in fact be local to the qudit, while others are genuine two-qudit gates. Hence, 
if U is a d" x d" unitary matrix and the 0(2 2 $") circuit is applied after splitting each qudit into p virtual qubits, 
we obtain an upper bound of <9(2 2 P") two-qudit gates. Note that this asymptotic is worse than both 0(d 2n ) and 
even 0(n k d 2 ") unless d is a power of two. (For 2 2 P > d 2 in this case, so the exponentials have distinct bases and 
are not asymptotically equivalent.) Thus, prior art does not suffice for the upper bound asymptotic. 

We next consider Option |2] Note that n qudits may be viewed as d" Hilbert space dimensions. Ignoring 
the local structure, a unitary evolution of H (n,d) may be realized as a subblock of a unitary evolution of n8 = 
\n log 9 d~\ qubits rather than np = n \log 2 d~\ as above. Indeed, with this form of emulation, it is true that <9(4" 8 ) = 
0(d 2 ") virtual two-qubit gates would suffice by earlier methods. However, in this mode of emulation a virtual 
two-qubit gate need not correspond to a two-qudit gate. Indeed, it might not even be a A:-qudit gate for k small. 
Consider for example two-qutrit gate of the form/3® V acting on 'H (3,3), where V £ U (3 2 ). This has a 9 x 9 block 
structure, but emulating such a unitary using qubits is more or less an arbitrarily difficult 5-qubit evolution. It is 
certainly not a two-qubit gate! Thus, although the mapping between Hilbert spaces is possible, tensor (Kronecker) 
product structures are not preserved. 
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There are several candidate systems for quantum computation where the physical subsystems encoding the 
quantum information have dimension d > 2. Examples include charge-position states in quantum dots 1181 . rota- 
tional and vibrational states of a molecule 1191 . truncated subspaces of harmonic oscillator states 1201 and ground 
electronic states of alkali atoms with total spin F > 1/2 1211 . Moreover, it is useful to allow d not a power of 
two. First, in many instances, the physics of the system can preclude encoding in a Hilbert space of arbitrary size. 
For example, in the case of encoding in alkali atoms the Hilbert space dimension of a single hyperfine manifold is 
2F + 1 so for bosonic atoms, d is never a power of two. 1 Second, there is evidence that the fault-tolerant threshold 
for quantum computation can be improved when using error correction codes on qudits with d prime 1221 . 

4 Controlled One-Qudit Operators 

Although the complexity bound is phrased in terms of two-qudit operators, our QR factorization algorithm in i|6] 
produces a quantum circuit of operators that act on one target qudit depending on the state of multiple control 
qudits. The majority of the ^-controlled one-qudit operators are doubly (k = 2) or singly (k = 1) controlled. We 
next review how a ^-controlled qudit operator may be realized in 0(k) two-qudit gates, given r = \{k— l)/(d — 2)] 
ancilla qudits. 

Qudit Generalizations of CNOT 

The most common two-qubit gate is the quantum controlled-not, due to its appearance in early papers on quantum 
computing and reversible classical computation. Incase d~2, this gate, denoted CNOT or A i linearly extends 
the action of the classical CNOT on bit-strings to two-qubit kets. Thus CNOT applies a NOT (a x ) iff the control 
bit is in state |1). So in two qubits with control on the most significant qubit, CNOT linearly extends |00) i— > |00), 
|01) i — ► |01), |10) i — ► |11), and 1 1 1 ) i— > 1 1 0) . An extension to arbitrary d has been suggested 1171 . We may vie w o v 
as addition mod 2, which generalizes as follows. If c £ Z/dZ is a dit, then we use (fflc) to (abusively) denote both 
the addition map k i— > k © c within Z/c/Z and also the one-qudit unitary operator given by the permutation matrix 
of this map. So for example in qutrits (d = 3,) 



A corresponding (unitary) permutation map INC is given by INC \j) = | ( j + 1 )mod d) for any base d. 

Then the CINC (controlled-increment) gate applies INC iff the control qudit is in state \d — 1), i.e. in the case 
of most-significant qudit control 



We take the symbol in a circuit diagram to designate modular increment INC as in the d — 2 case, so that the 
usual symbol for CNOT in a if -level diagram now designates CINC. 

Using the most recent argument 1171 . a second generalization of CNOT must be added to the qudit local 
unitary group U(d) m in order to recover exact universal qudit computation. For this, we label <5 X ®I c i-2 as 
that computation with |0) *-> |1) and all other \j) i— > \j), 2 < j < d — 1. Then the appropriate second gate is 
Ai (o* @Id-i)- Note that a CINC gate may be constructing using INC gates and d — 1 copies of /\ j (cr r © Id-2) 
1171 . Since a QR argument ibid, also produces any two-qudit operator with at most 0(d 2 ) = 0(1) gates from 
the library {local unitary} U {Ai (o* ©^7-2)}, the optimal asymptotics of the Theorem apply equally well to this 
library. 

1 The dimension of the total ground state Hilbert space including both manifolds corresponding to the two spin states of the valence electron 
may be a power of two e.g. 87 Rb and 133 Cs. 
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Figure 1: Qutrit (d = 3) emulation of /\([did2diT],V) using at most two-qudit operations. The general argument 
shows that /\(C,V) requires at most 0(#C) two-qudit gates. The rightmost circuit diagram is well-known 1121 
Figure 1]. Also, the controlled-V may be decomposed into local operators, CINCs, and Ai ©^-2) if the finer 
gate library is of interest 1171 . 



Emulation of multiple-controlled operations by single control operations 

The complexity of a quantum algorithm is determined by the asymptotic number of single qudit and two-qudit 
gates necessary to implement the corresponding unitary. We describe here how to emulate controlled one-qudit 
operations using only single qudit and two-qudit gates. In n qudits, a controlled one-qudit operator V is applied 
to a target qudit based on a string of « — 1 controls. Each control is either *, to denote a match with an arbitrary 
value (no control,) or is chosen to be one of 0,1,... ,d— 1, to force a specific matching value (control.) Note that 
single qudit control and local operations may be used to emulate multiple-controlled gates at low cost. In circuit 
diagrams, we will denote a control triggering on an arbitrary state \ j) with a box, in contrast to the standard control 
denoted by a bullet that only triggers on state \d— 1). One formal definition of a controlled one-qudit gate is the 
following: 

Definition 4.1 [Controlled one-qudit operator /\(C,V)] Let V bead x d unitary matrix, i.e. a one-qudit operator. 
Let C = [C1C2 ■ ■ ■ C„] be a length-n control word composed of letters from the alphabet {0, 1, . . . ,d — 1}U{*}U 
{T}, with exactly one letter in the word being T. By #C we mean the number of letters in the word with numeric 
values (i.e., the number of controls,) and the set of control qudits is the corresponding subset of {1,2, ...,n} 
denoting the positions of numeric values in the word. We will say that a control word matches an n-dit string if 
each numeric value matches. Then the controlled one-qudit operator /\ (C, V ) is the n-qudit operator that applies V 
to the qudit specified by the position of T iff the control word matches the data state's «-dit string. More precisely, 
in the case when C„ = T, then 

a rx n n r Ti t/m \ / |ci...c„_i}<g>V|c„), cj = Cj orCj : = *, 1 <k<n-\ 

A([CiC 2 ...C„- 1 r],V)|c 1 c 2 ...c, ? ) = I | C1 ... C „_ 1C „), 7 7 else (5) 

Alternatively, if Cj — T (j < «,) we consider the unitary (permutation) operator %" that swaps qudits j and n. 
Thus, x" \did 2 ■ ..d n ) = \d\d 2 . ..dj- X d n d j+x . ..d n - X dj). We remark that %" = (X'jV = (X") -1 - We a PPty the 
same permutation to C = [C1C2 ... Cj-\TCj+ \ . . . C„], obtaining C = [C1C2 ... Cj-\C n Cj+\ . . .C n -\T\ and we define 

A(c,y) = x^A(c,v)x; ! - 

We note an earlier simulation 1 123 of /\(C,V) in terms of two-qudit operations. In addition to « data qubits, 
we also require r = \{n — 2)/(d — 2)] 1121 ancilla qudits initially set to |0), as illustrated in Figure^for #C = 3 
and qutrits (d = 3.) The idea is to use local operations to control on any logical basis state. Then a sequence 
of CINC's appropriately targeting the r ancillas change the state of the last ancilla to \d— 1) if and only if each 
control line carries \d — 1). The entire operation follows by applying a singly-controlled V using this last ancilla 
and then mirroring the CINC pattern in order to disentangle the ancilla qudits from the data qudits. 
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5 Asymptotically Optimal Qudit State-Synthesis 



The key component of our universal qudit circuit is a subcircuit interesting in its own right: given a state vector 
|v|/) G H (n,d) =C d , we construct a sequence of p controlled one-qubit operators depending on such that 

fl/\[C(p-k+l),V(p-k+l)]\y) = |0). (6) 

k=l 

We remark that we use V(p — k+ 1) instead of Vp-k+i, since the latter sort of subscript is often used to denote a 
target qudit while we intend the target to be labelled by the T symbol within the word C(p — k+ 1). 

Before continuing to construct this component of our universal circuit, we note that this subcircuit achieves 
asymptotically optimal qudit state-synthesis. The state-synthesis problem is to construct efficient (small) quantum 
circuits whose associated unitary U has U\Q) = |\|/) for some arbitrary but pre-determined G 9{{n,d). For 
d = 2, several works address this topic, e.g. 1 25 26 , 27 28 1. For our subcircuit in Equation^ note that 

flNC(k),V(k?} |0) = | ¥ ) (7) 

k=i 

Our construction realizes any |\|/) in p = (d n — \ )/(d— 1) G 0(d") two-qudit gates, since also #C (k) < 1 for all 
k. Given the £l(d n ) lower bound of the introduction, we conclude that qudit state synthesis generically requires 
&(d n ) gates. 

Finally, we briefly note our decision to index the sequence of /\ (C, V) so that the earlier indices appear on the 
right. There are two reasons for this. First, it means the index describes the operators in the order in which they 
are applied to |\|/), rather than the reverse. Second, the state-synthesis has received more attention than generalized 
Householder reductions in the literature, and note that the indices of Equation0do increase to the right. 



One- qudit Householder Reflections 

Earlier universal d — 2 circuits 1151 relied on a QR factorization to write any unitary U as a product of Givens 
rotations, realized in the circuit as ^-controlled unitaries 1161 . Such Givens rotations V coincide with the identity 
matrix except in the pairwise intersection of rows j, k, with columns j, k. Here, the entries Vjj,Vjk,Vkj,Vkk entries 
mimic those of a 2 x 2 unitary matrix. Thus, a Givens rotation is geometrically a rotation in the j'fc-plane. In 
the multi-level case, we use Householder reflections 1231 §5.1] instead of Givens rotations, in order to take full 
advantage of the range of single qudit operators. 

Thus, suppose |\|/) G 9{{\,d), perhaps not normalized, and suppose we wish to construct a unitary operator 
W such that W |\|/) is a multiple of |0). Standard formulas exist for constructing such W for real vectors. For a 
complex vector, these formulas become 

| 1^ = | V )-v^>^|0> (8) 
1 W = / rf -(2/<r||ri)) |n) (ti| 

Then indeed W |\|/) is a multiple of |0). 



n-qudit State-Synthesis 

We next describe the algorithm for realizing flLi f\[C{p —k+ l),V(p -k+ 1)] |\|/) = |0) with #C(k) < 1 and 
p = (d" — l)/(d—l). The circuit topology has a recursive structure that we abstract into the following algorithm 
that generates the ^-sequence ("club-sequence".) 
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n 


*-sequence, d = 3 


1 




2 


o*. l*, 2*, A A 


3 


00*. 01*. 02*, 0**, 10*, 11*, 12*, 1**, 20*, 21*, 22*, 2**. *** 


4 


000*, 001*, 002*, 00**, 010*, 01 1*, 012*, 01**, 020*, 021*, 022*, 02**, 0*** 
100*, 101*, 102*, 10**, 110*, 111*, 112*, 11**, 120*, 121*, 122*, 12**, 1*** 
200*, 201*, 202*, 20**, 210*, 21 1*, 212*, 21**, 220*, 221*, 222*, 22**, 2***, **** 



Figure 2: Sample *-sequences for d = 3, i.e. qutrits. 



Algorithm 1: {s l: . . .,s p } = Make-*-sequence(rf,«) 

% We return a sequence of p = {d n — l)/(d — 1) terms, with n letters each, 
% drawn from the alphabet {0, l,...,d—l, *}. 
Let {sj} p j=l = Make-*-sequence (d,n - 1.) 
for q = 0, 1, . . . ,d— 1 do 

The next (d"~ —l)/(d—l) terms of the sequence are formed by prefixing the letter q to each 

term of the sequence {/;}. 
end for 

The final term of the sequence is *". 



Sample *-sequences that illustrate the construction are given in Figure|2] Note that the number of elements 
in the sequence equals the number of uncontrolled or singly-controlled one-qudit operators in the state-synthesis 
circuit. We choose to describe the circuit by iterating over the sequence. Thus, in order to produce the circuit, it 
suffices to describe how to extract the control word C from a term t of the *-sequence and how to determine V 
from the term and where |v|/y) = Yli = \ /\[C{p — k+ 1), V(p — k+ 1)] is the partial product. This may be 
done as follows. 



Algorithm 2: /\(C,V) = Single-*Householder (* term t = t\ti ...t n , n-qudit state \\fj)) 



Initialize C = **■■■* 
% Set the target: 

Let I be the index of the leftmost * and set Q ~ T. 

% Set a single control if needed: 

if t contains numeric values greater than 0, 

Let q be the index of the rightmost such value and set C q = t q . 
end if 

Given |\|/,) = L^lo 1 ( k \Vj) \ k )' form a one-qudit state |<p) = Y? k Zl (*i*2---^-i*00...0|\|r/) \k). 
Form V as one-qudit Householder such that V |cp) = |0). 



Figure|3]illustrates the gate produced from the output C and V from the algorithm Single-*Householder. Fig- 
ure|4]illustrates the order in which these f\(C,V) reflections are generated if we iterate over the *-sequence. Each 
node of the tree is labeled by a *-term and represents a Householder reflection defined by the three indicated ele- 
ments of \\\t). After the reflection, the first element in the node remains and the others are zeroed. The reflections 
are applied by traversing the graph in depth-first order, left to right. To understand the controls, notice that the left- 
most Householder on each layer of the graph requires no control. For example, the Householder 0** defined by 
elements OjO (j = 0, 1,2) is applied to 9 sets of elements: Qjl and Qj2, all zeroed, and 1 jO, Ijl, 1 j2,2j0,2jl,2j2, 
as yet not zeroed. For the other Householder nodes, the control is indicated in boldface. For the leftmost House- 
holder in a group of d siblings, we do not wish to touch elements in groups to the left of it. So we set the control 
to stay within the group. For example, the Householder labeled 10* is also applied to elements in 1 1* and 12*. 
For other Householders in a group, the corresponding elements in groups to the left are completely zero, and in 
groups to the right are as yet unzeroed. Thus, for example, the Householder labeled 11* is applied to 01* (all 
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Figure 3: Producing a f\(C, V) given V and a term of the ^-sequence, here t = 21004kJMk for seven qudits. The 
algorithm for producing C places the V-target symbol T on the leftmost club, here line 5. The active control must 
then be placed on the least significant line carrying a nonzero prior to line 5, here the 1 on line 2. (A control on lines 
3 or 4 would not prevent the nonzero oco of \vfj) = Ylk^o a k \ k) from creating new nonzero entries in previously 
zeroed positions.) Thus in this case, C = *1 * *T* *. The V is chosen to zero all but one a* for k = 210(M)0. 
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Figure 4: Using the ^-sequence for d = 3, n = 3 to generate Householder refections to reduce to a multiple of 
|0). Each node is labeled by a J|k-term and represents a Householder reflection /\(C,V). The control is indicated 
by the boldface entry in the label. As the tree is traversed in a depth-first search, each node indicates a /\(C, V) 
which zeroes the components of the last two indices in each node using the component of the top entry. 



zero since OJfcJjt has already been applied) and 21J|k. We can formalize this argument to a proof of correctness as 
given in AppendixlAl 

Householder Circuits Retaining \j) ^ |0) 

In the QR unitary-circuit application, we will need not only |\|/) h-> y/ (\|/|\|/) |0) but also |\|/) h-> y/ \|/) \j) for 
any j = d\di . . .d„. Rather than provide a new algorithm, we instead adapt our algorithm for a collapse onto 
|0) into an algorithm for collapse onto \j). The idea is to permute the elements to put j in position 0, apply 
Single-JjkHouseholder, and then permute back. The rest of this subsection describes this in detail. 

We abusively continue to use (Bp for I < p < d — I to denote the one-qudit unitary operator that carries 
\ k ) ^ \{p + k)modd), i.e. ®p = INC P . Given the d-ary expansion of j, we have ® n k=l [®dk] |00...0) = 
Consider /\(C,V), and define Cby 

!*, Ck = * 

T, C k = T (9) 

(C k + d k )modd, C k G {0, 1, . . . ,d - 1} 
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Suppose also that Q = T. Then noting that (© jy = ®(d — j), we have the similarity relation 

[® n k=1 ®{d k )]/\(C,V)[® n k=1 ®(d-d k )} = /\[C,(®d i )V(®d-d i )} (10) 
This is the basis for the algorithm. 

Algorithm 3: f\{C,V) = ^Householder (|\[/) ,j,d,n) 
% Reduce \\\f) onto \ j). 
Let j = d\di ■ ■ - d n . 

Compute | cp) = [(g)" t ®d-d q ] |\|/). 

Produce a sequence of controlled one-qudit operators so that 

nLiA[c(/?-£+i),v(/?-£+i)]|(p> = |oo...o), 

using Single-JjkHouseholder applied to each term of Make-^-sequence(^,n). 

Compute [® n q=x {®d q )] A [C(p - k+ 1), V (p - k+ 1)] [® n p= l (®d-d q )} = 
A [C(p -k+l),V(p-k+l)] using EquationfTol 



6 A Qudit-native QR-based Quantum Circuit Synthesis Algorithm 

The asymptotically optimal qudit-universal circuit we present does not require Gray codes (Cf. 1111 .') Rather, it 
leans heavily on the optimal state-synthesis of S|5] Since this state-synthesis circuit can likewise clear any length d" 
vector using fewer than d" single controls, the asymptotic is perhaps unsurprising. However, the recursive nature 
of our synthesis algorithm requires highly-controlled one-qudit unitary operators when clearing entries near the 
diagonal, and other highly-controlled one-qudit unitaries are needed to finish clearing each column. In presenting 
the algorithm, we highlight two themes: 

• We process the size d" x d" unitary V in subblocks of size d"^ 1 x d' . 

• Due to rank considerations, at least one block in each block-column of size d" xd"~ l must remain full rank 
throughout. 

Hence, we cannot carelessly zero subcolumns. One solution is to triangularize the d"~ l x c/" _1 matrices on the 
block diagonal, recursively. We also note that only 0(n 2 d n ) fully (n — 1) controlled one-qudit operations appear 
in the algorithm, which is allowed when working towards an asymptotic of 0(d 2 ") controls total. 

The organization for the algorithm is then as follows. Processing (triangularization) of V moves along block- 
columns of size d" x d"~ l from left to right. In each block-column, we first triangularize the block d"~ l x d"~ l 
block-diagonal element, perhaps adding a control on the most significant qudit to a circuit produced by recursive 
triangularization. After this recursion, we zero the blocks below the block-diagonal element one column at a time. 
For each column j, < j < d"~ l — 1, the zeroing process is to collapse the d"~ l x 1 subcolumns onto their 
entries, again adding a control on the most significant qudit to prevent destroying earlier work. These subcolumn 
collapses produce the bulk of the zeroes and are done using ^Householder. After this, fewer than d entries remain 
to be zeroed in the column below the diagonal. These are eliminated using a controlled reflection containing n — 1 
controls and targeting the top line. With the appropriate one-qudit Householder, the diagonal entry will zero lower 
nonzero terms while the older zeroes in the column are protected by the controls on the lower n — 1 lines. 

We now give a formal statement of the algorithm. We emphasize the addition of controls when previously 
generated circuits are incorporated into the universal circuit (i.e. recursively telescoping control.) 
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Algorithm 4: Triangle (U,d,n) 
if n = 1 then 

Triangularize U using a reduction, 
else 

Reduce top-left d"~ l x a!" -1 subblock using Triangle(*, d,n — 1), (writing output to bottom n — 1 circuit lines) 
for £ = 0, 1 , . . . ,d — 1 do % Block-column iteration 
for columns = kd n ~ 1 , . . . , [{k + 1 - 1] do 
for £ = {k + 1), . . . , (d — 1) do % Block-row iterate 

Use ^Householder to zero the column entries {k + t)d n ~ l , . .. ,[(k + l+ l)d"~ l — 1], 
leaving a nonzero entry at (k + £)c2 ■ ■ ■ c„ for j = c\c%... c„ and 
adding \k + £) - control on the most significant qudit. 
end for 

Clear the remaining nonzero entries below diagonal using one /\[Tc2 ...c n ,V]. 
end for % All subdiagonal entries zero in block-col 

Use Triangle(*,c/,n — 1) on the d"~ l x d"~ l matrix at the (k+ l) st block diagonal 
adding \k+ 1}- control to the most significant qudit. 
end for 
end if-else 



To generate a circuit for a unitary operator U, we use Triangle to reduce U to a diagonal operator W = 
Yfj^Q \j) Now V and U = WV would be indistinguishable if a von Neumann measurement {\j) 
were made after each computation. However, the diagonal is important if U is a computation corresponding to 
a subblock of the circuit of a larger computation with other trailing, entangling interactions. In this case, Figure 
^makes clear how to build a circuit for a controlled relative phase I c p + (e' e — 1) \j) (j\ in 0(n) gates. Since 
W has only 0(d") such phases, the corresponding circuit for W costs 0(nd") two-qudit gates and as such is 
asymptotically irrelevant to &(d 2n ). 



7 Counting Gates and Controls 



Let h(n,k) be the number of ^-controls required in the Single-JjkHouseholder reduction of some |\|/) G tt n . Then 
clearly h(n,k) = for k > 2. Moreover, each 0-control results from an element of the ^-sequence of the form 
00 . . . OJMk . • • and there are n such sequences. Thus, since the number of elements of the ^-sequence is (d n — 

\)/{d— 1), we see that 

' h(n,l) = (d"-l)/(d-l)-n 
h(n,0) = n K ' 

We next count controls in the matrix algorithm Triangle of ^6] We break the count into two pieces: g for the 
work outside the main diagonal blocks and / for the work within. 

Let g(n,k) be the number of controls applied in operations in each column that zero the matrix below the block 
diagonal; this is the total work in the for j loops of Triangle. We use Single-JjkHouseholder d(d~l )d n ~ 1 / 2 times 
since there are d(d — l)/2 blocks of size d"^ 1 x d"^ 1 below the block diagonal, and we add a single control to 
those counted in h. The last statement in the loop is executed d" — d"~ l times. Therefore, letting be the 
Kronecker delta, the counts are 



;{n,k) = 8"- l (d"-d"- l ) + -d(d-l)d n - 1 h{n-l 7 k-l) 



(12) 



Supposing n > 3, then we see that 



>(n,k) 



d"-d"-\ k = n-l 

0, n - 1 < k < 3 

\d n (d"~ 1 - 1 ) - U n (d - 1 ) (n - 1 ) , k = 2 

\d n {d-l)(n-\), k=\ 

0, k = 



(13) 
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Figure 5: Total number of control boxes in the new universal circuit as a function of the level d and number of 
qudits n 

Finally, let f(n,k) be the total number of ^-controlled operations in the Triangle reduction, including the block 
diagonals. This work includes that counted in g, plus a recursive call to Triangle before the for k loop, plus (d — 1 ) 
calls within the k loop, for a total of 

/(»,*) = g(n,k)+f(n-l,k) + (d-l)f(n-l,k-l), (14) 

wifh/(n,0) = 1 and/(l,fc) =0for«,fc>0. 

Using the recursive relation of Equation[H]and the counts of Equation^] we next argue that Triangle has no 
more than 0(d 2 ") controls. Two lemmas are helpful. 

Lemma 7.1 For sufficiently large n, we have f(n,k) < d 2n ~ k+4 . 

Proof: By inspection of Eauationll3l we see that g(n,k) < (1 /2)d 2n ~ k+2 for all k and n large. Now /(«,0) = 1, 
which we take as an inductive hypothesis while supposing /(« — \ < d 2n ~ 2 ~ i+4 = d 2n ~ l:+2 . Thus, using the 
recursion relation of Eauation ll4l 



f(n,k) < U 2 "- k+2 + d 2 "- k+2 +(d-l)d 2n - k+3 



^ +4 (^ + l- 



(15) 



2# T # T d ! 

Now since d > 3/2, we must have j > ^W, whence an inductive proof of the result. □ 

Lemma 7.2 HlZ 1 Q kf{n,k)eO(d 2 "). 

The proof of Lemma IT2l follows from checking L^=o^ 2 " ~ k G 0(d 2 "). The latter fact follows from either 
explicitly computing the sum by deriving the appropriate geometric series or alternately using integral comparison. 
Thus, the total number of control boxes in the circuit digram grows as 0(d 2 "). The theorem of the introduction 
asserting a size 0(d 2 ") universal circuit composed of two-qudit gates follows, given the commentary of <^]on 
decomposing a ^-controlled one-qudit operator into two-qudit gates. 

Figure[5] shows actual counts of control boxes for specific instances of d, n. These are illuminating given that 
Lemma lTTl overestimates the number of ^-controls for most k. These counts are obtained using a C++ implemen- 
tation of the recursion presented in this section and have been verified by an explicit Mat Lab implementation of 
the entire circuit synthesis algorithm for small d, n. 



8 Conclusions 

We conclude with some remarks. Locality in quantum mechanics is a function of the tensor (Kronecker) prod- 
uct structure of the state space in question. In quantum computing, the Hilbert space factors are often finite 
dimensional. Measuring difficulty by counting two-particle interactions, we have generalized a recent optimal 
asymptotic of @(2 2 ") for two-level quantum bits to a new optimal asymptotic &(d 2 ") for <f-level quantum dits. 
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The result is exponentially better (asymptotically) than that obtained by emulating such qudits with qubits, given 
d ^ 2 e . This arises since the tensor decompositions are incompatible, except in the case that d is a power of 2. 

Multi-level quantum logics have been proposed as an alternative to qubits due to the trade-off in the tensor 
structure. For d > 2, there is a larger space of local operations, and fewer entangling gates might be required to 
realize a target quantum computation (unitary evolution ]9)-) This work has moreover demonstrated that such a 
benefit does not scale with the number of particles «, but rather must consist (at most) of a constant factor reduction 
in the number of required entangling gates. However, our result only applies to symmetry-less evolutions, and 
particular computations might be better suited to certain multi-level and tensor structures on Hilbert space than 
others. 
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A Proof of Correctness for State-Synthesis 

We sketch the proof of correctness of the Algorithm for state-synthesis employed to attain Equation|5J 

f\/\[c(p-k+l),V(p-k+l)} | ¥ ) = |0) (16) 

k=\ 
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Given the Algorithm, p = (d" — \ )/{d— 1) is the number of elements of the ^-sequence. Suppose for clarity 
that |\|/) is generic, so that no amplitudes (components) are zero at the outset. Then it would suffice to prove 

(i) that each operator f\[C (j) ,V (j)] introduces d — 1 new zeroes into the state not present in ytyj) and 

(ii) A[c{j),V(j)] does not act on previously zeroed entries. The assertion (i) is straightforward and left to the 
reader. However, the second assertion is false. Rather, the controlled one-qudit operators do act on previously 
zeroed entries, but they act in such a way that only linear combinations of these zeroes are ever introduced as new 
amplitudes. The discussion below makes this assertion precise and proves it. 

To facilitate this, we label S = {0, 1, . . . ,d n — 1} the index set and introduce the notation S*(j) for the set of 
component indices of that have not explicitly been reduced to a zero by some /\[c(k), V(k)], k < j. We label 
S[c{j)] to be the set of control indices of C(j), per Definition 14. II Also, define £ by C(j)e = T. Now there is a 
group action of Z/ofZ on the index set S corresponding to addition mod d on the (F 1 dit: 

c *t c\C2-..c„ = c\C2 ■ ■ + c mod d)Q+i . . .c„ (17) 

Since the operator V(j) is applied to qudit I, the amplitudes (components) of are either equal to the 

corresponding amplitude of ylfj) or else are linear combinations of the ylfj) -amplitudes whose indices lie in the 
TLjdTL orbit contained in S[c(j)]. Formally, we have proven the following Proposition. 

Proposition A.l Suppose 

(z/rfZ) •/ s*{j)ns[cU)] c s*{j)ns[cU)] (U) 

(We remark that should theinclusion hold, then it is an equality.) Then has at least d—\ more zero 

amplitudes than 

The final question is how one proves the appropriate set inclusions. The point is to carefully understand the 
structure of 5* (j). We will eventually prove that 5* (j) is the union of the three sets R\ (j), R2U), and R3 (j) below. 
However, we define them independently, as the induction technically requires the decomposition at the j^ 1 step to 
avoid mixing as the next operator is applied. 

Definition A.2 Suppose the y'" 1 term of the ^-sequence is given by c\C2 ■ ■ . . . «V We have c(j) the 

corresponding control word, with C(j)t = T. Consider the following three sets, noting R\ (j) may be vacuous. 

RiU) = Ug=o{ cic 2 ...c q k00---0 ; k<c q+u ke{0,l,...,d-l} | 

R 2 (j) = j ci---q_!«)0...0; k€{0,l,...,d-l} | (19) 
R 3 (j) = \ fi---ft-ikikz+i ...k n ; f\h---ft-\ >c\c%---ci-\,K G {0,l,...,d-l} 



Remark A.3 These sets may be interpreted in terms of Figure |4] Recall the figure recovers the ^-sequence by 
doing a depth-first search of an appropriate tree. In this context, 5* (j) is the set of nonzero components of |\|/,) at 

the f node. The subset ^3 (j) results from indices that lie in nodes not yet traversed, loosely below the present 
node in the tree or to the right. The set R2U) is precisely the set of indices in the current node, node j. The 
set Ri (j) is the set of indices of elements that have been previously used to zero other elements and still remain 
nonzero themselves; it is the set of indices of elements that were always at the top of nodes already traversed in 
the depth-first search. Thus, R\ (j) is loosely a set of entries within nodes to the left and perhaps above node j. O 

Lemma A.4 Let C(j), £, be as above, and label 5* (7) =R\{j)UR2{J)\JR^{j). Then 

(z/dz) * t s*(j)ns[c(j)] c S»{j)ns[c(j)] (20) 
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Proof: Due to the choice of a single control on a dit to the right of position I in the appropriate term of the 
^-sequence, R\ (j) DS[c(j)) = 0. On the other hand, a direct computation verifies that (Z/dZ) •(.R 2 {f) C R2U) 
and also that R 2 (j) fl S[c (j)] = R 2 (j)- 

Finally, we note that (Z/c/Z) •^^3(7) C R 3 (j)- However, the following partition is in general nontrivial: 

k 3 (/) = {R3U)ns[c(j)]}u{R 3 (j)n(s-s[cU)})} (21) 

Should c(y') admit no control, we are done. If not, let m < I be the control qudit, i.e. 5[c(y)] = { m }- Then 



R3{j)nS[c(j)] = ^fi---f e - 1 k i ke+i...k n -,f m = c m J 1 f 2 ...fe- l >c 1 C2---ce-uk if €{0,l,...,d-l} j (22) 

Hence the Z/dZ action respects the partition of Eauationl21las well. □ 

Lemma A.5 Let C(j), £, andStlj) be as above, with C{j) resulting from c\C2 ■■ .Q_iJfr .. .ftftof the 9^-sequence. 
Let Z = {cic 2 ...ce-ik00...0 ; k G {1,2, . . . ,d - 1} nZ} be the elements zeroed by f\[c (j) ,V '(/)]■ Then R\{j) U 
fi 2 C/) u ^3 (;) =fli(j + l)Ufl 2 (j+l)Ufl 3 (; + l)UZ. 

Proof: We break our argument into two cases based on the value of q_ 1 . 

Case C£-\ < d — 1: The (7 + l) st term of the ^-sequence is is given by c\C2 ■ ■ ■ + 1)00 . . .OJfr. Note that for 
leaves of the tree, the buffering sequence of zeroes is vacuous. 

#l(7 + l) = Ri(j)UR z (j)-Z 

R 2 (j+l)UR 3 (j + l) = R3(j) { ' 

Hence R t (j) U R 2 (j) U R 3 {j) = Ri (j + 1) UR 2 (j + 1) UR 3 (j + 1) U Z. 

Case c p = of — 1: Suppose instead the / ^-sequence term is c\c 2 ■ ■ .cg- 2 (d — 1 . . .Jit, so that the (j + l) st 
term is cic 2 ...q_ 2 ***---*- We note that {c ci . . .ci- 2 {d- 1)0... 0} G R 2 (j) nfl 2 (; + l). 2 Then 

RiU) = /?i(j'+l)U/?2(7 + l)-{coci...Q_2(rf-l)0...0} 

R2C/) = 2U{coci...c/_ 2 (£/-l)0...0} (24) 
*aC/) = R3U+I) 

iFmm the first two, R 1 {j)UR 2 (j) =Ri {j + l)UR 2 (j + 1) U Z. Hence R 1 {j)UR 2 (j) UR 3 (J) = «i (j + 1) U* 2 C/ + 

l)u/? 3 (/ + l)U2;. □ 

Proposition A.6 S*(,/) = ,Ri(j)U,R 2 (j)U,R3(j) is the set of zero amplitudes (components) of a generic 

Proof: The proof is by induction. For j = 1, we have 

J?i(l)=0, i? 2 (l) = {00...0*}, J? 3 (l) = {ciC2...c„_i*; somec 7 >0} (25) 

Hence the entire index set S = S*(l) = Ri (1) U/? 2 (l) U/? 3 (l). 

Hence, we suppose by way of induction that S*(j) = Ri(j) \JR 2 (j) \JR 3 (j) and attempt to prove the similar 
statement for j + 1. Now /\[C(j),V(j)] will add new zeroes to the amplitudes (components) with indices Z by 
Lemma 1X31 On the other hand, A[C(j),V(j)] will not destroy any zero amplitudes existing in S*(j) due to the 
induction hypothesis, Lemma lA~4l and Proposition lA.il Thus 5*0'+ 1) =Ri(j+ 1) UR 2 (j + 1) \JR 3 (j + 1). □ 



2 So in the application, the amplitude (component) of this index is the single amplitude not zeroed by A[c(j),V(j)], but it is immediately 
afterwards zeroed by f\[c(j+ 1)> V(j + !■)]■ 
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B Unitary Circuits From State-Synthesis Circuits 



In this appendix, we give an alternate construction of optimal order circuits for unitary evolutions from optimal 
circuits producing states per fj5] We present a constructive procedure for building any unitary U 6 C d xd from 
d" copies of an optimal state-synthesis circuit and other asymptotically negligible subcircuits using an eigen- 
decomposition of U 1271 , If the state-synthesis circuit is any choice that contains 0(d") gates, then the resulting 
0(d 2n ) circuit for U is optimal. 

Let \kj = e ' }j_ be the eigenvalues of U, with { | Aj) }j_ a corresponding set of orthonormal eigenvectors. 
We suppose circuits containing 0(d") two-qudit gates for unitary operators Wj with Wj \ j) = < j < d n — 1. 
Then for a second set of phasing unitaries Pj, we may write 

wj = \*.j)U\+Ij#j\yfj*)(k\ (2 „ 

Pj = + 

Then by unitarity, (\\f jtk \Xj) = for all j,k. Now note that WjPjWj = e i6 j \Xj)(X/\ + Etyj (Ym| = 
e' Q J \Xj) \ hj) (^/|. Similarly by induction, the following equality may be verified: 

k d"-i 

(W PoWj){W l P l W?)...(W k P k wZ) = £e^|Xy><X y |+ E ( 2? ) 

j=0 j=k+l 

Then considering the eigendecomposition of U and taking k = d" — 1, we have the following factorization: 

d n -\ 

V = II WW} (28) 

j=o 

Now note that the techniques of fallow for realization of Pj in 0{n) two-qudit gates. Thus, since by hypothesis 
each Wj admits a size 0(d") circuit, the circuit corresponding to Equation|28]contains 0(d 2 ") two-qudit gates and 
is asymptotically optimal. 

As a remark, the circuit synthesis procedure might take |0) i— > X/) rather than \ j) t— > \Xj). However an 0(d n ) 
circuit for the latter extracted from an 0(d") circuit for the former follows from the similarity transform by a local 
unitary per fj5] 



C Optimal Asymptotics for Qudit Chains 

The optimal asymptotic of &(d 2 ") also holds for more restrictive gate libraries reflecting a choice of architecture. 
We note this in passing, focusing on the qudit chain architecture. 

In the interest of being brief, we do not use formal definitions. Note that the body shows that the library 
LUU (c x ®Id-2)} is qudit universal, where LU = <E>'[SU (d) and we intend any instantiation of Ai (° A ®Id-2) 
to be allowed. An architecture will here refer to a restriction on this gate library. In particular, one supposes that the 
qudits correspond to the vertices of some graph, which by hypothesis is connected. Then only the instantiations of 
A l (o A © ld-i) which correspond to edges of the graph are allowed. Since we may construct qudit SWAP between 
qudits connected by an edge, the restricted library is also qudit-universal. However, the asymptotics of the library 
gates might be different from the asymptotics of the standard gates. Loosely, instantiations of ®ld-i) 
between qudits 0[n) vertices apart will now cost 0(n) gates rather than one, since 0(n) SWAPS between adjacent 
qudits are also needed. 

The notion of a sub-architecture follows by comparing graphs and subgraphs. Thus, if a qudit chain is the 
architecture of a linear sequence of qudits with consecutive qudits joined by edges, then the qudit chain is a 
subarchitecture of a finite square, hexagonal, or cubic lattice. If a sub-architecture contains every vertex, then 
asymptotics of the smaller architecture are at least as good as those of the larger. For the inclusion only admits 
more possible two-qudit gates. 

Thus consider in particular a qudit chain. Suppose further the ordering of the dits implicit in earlier notations, 
e.g. d\did^ . . .d„, is now referring to the architecture as well. Thus given the architectural restriction, using 
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SWAPs we see that an instantiation of /\j {p x (Bld-i) costs 0(\j — k\) architecture gates if controlled on qudit j and 
targeting qudit lc, rather than the old count of one gate. A similar comment applies to any two-qudit gate acting 
between qudits j, k. 

By Appendix [B] the &(d 2 ") asymptotic follows if we show that the ^-Householder reduction requires only 
0(d") architecture-local two-qudit gates. Hence let a(d,n,k) denote the number of length k singly-controlled 
f\i{V) specified by the club sequence, where a two-qudit operator acting on qudits j, I, has length j + £+l. As 
an example, the operator of Figure|3]is length four. Now for most k, a length k operation within the (n + 1 ) st club 
sequence results from a sequence of k— 1 zeroes in some term of the n sequence in one of two ways: 

• A length n term of the form did?, . . . djOQ . . . QJIt . . . £ is preprended to become d\di... dtOO . . . Oft ...Jit. 

• The length n term of the form 00 . . . 0J|k . . . 4k is prepended to become d\ 00 . . . 0# . . . J». Here, d\^0. 
Noting this structure, we produce the following recursion relations, which completely determine a(d,n,k): 

a(d,n+l,k) = (d — 1) +d a(d,n,k) 

a(d,n,0) = n (29) 
a(d,n,k) = if k>n— 1 

Now a(d,n,0) = n does not factor into the recursive structure of the other a(d,n,k). Rather, evaluating the recur- 
sion explicitly for 1 < k < n — 1 , we obtain 

n-k-i 

a(d,n,k) = (d-l) £ d l = d"~ k -l (30) 
e=o 

We finally use this recursion to obtain our main result. 

Indeed, note that since a length k singly-controlled operation may be realized in 0(k) local gates, it suffices 
to prove that L"=o ^ a(d,n,k) = L"=o k(d"~ k — 1) is a function within 0(d"). This follows by either deriving the 
appropriate geometric series in order to obtain the exact sum Yl}=o kd n ~ k or alternately by integral comparison of 
this second sum. Thus, even in the chain gate library, the ^-Householder reduction requires no more than 0(d") 
gates and hence recovers an optimal state-synthesis asymptotic of &(d"). Consequently, AppendixlBlproduces an 
asymptotic of &(d 2n ) chain architecture-restricted gates for any unitary evolution U £U {d 2n ). 
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